Memory Allocation Discipline Example
6 min read🧪 1️⃣ BenchmarkDotNet — Measuring Allocation Discipline
This microbenchmark compares two implementations of tick parsing:
- Naive: uses
string.Split()anddouble.Parse() - Optimized: uses
Span<byte>+Utf8Parser(zero allocations)
📄 TickParsingBenchmarks.cs
using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser] // shows allocations in bytes per operation
public class TickParsingBenchmarks
{
private readonly string tickLine = "EURUSD,1.07432,1.07436";
[Benchmark(Baseline = true)]
public (string, double, double) NaiveParse()
{
var parts = tickLine.Split(',');
var symbol = parts[0];
var bid = double.Parse(parts[1]);
var ask = double.Parse(parts[2]);
return (symbol, bid, ask);
}
[Benchmark]
public (string, double, double) SpanParse()
{
ReadOnlySpan<byte> span = Encoding.ASCII.GetBytes(tickLine);
int firstComma = span.IndexOf((byte)',');
int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;
string symbol = Encoding.ASCII.GetString(span[..firstComma]);
Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);
return (symbol, bid, ask);
}
public static void Main() => BenchmarkRunner.Run<TickParsingBenchmarks>();
}
⚙️ Run it:
dotnet add package BenchmarkDotNet
dotnet run -c Release
🧾 Expected results (typical output):
| Method | Mean | Allocated |
|----------- |-----------:|------------:|
| NaiveParse | 1.200 μs | 1.24 KB |
| SpanParse | 0.245 μs | 32 B |
💡 Interpretation:
- The optimized version is ~5× faster.
- It reduces allocations from ~1.2 KB → ~32 bytes per tick.
- Over 1M ticks/sec, that’s ~1.2 GB less allocation per second 🤯 — huge difference for a trading backend.
💹 2️⃣ Realistic Tick Processor Example
Now let’s build a GC-efficient Tick parser — something you can confidently mention if they ask, “How would you design a real-time price feed handler?”
📄 TickProcessor.cs
using System;
using System.Buffers;
using System.Buffers.Text;
using System.Text;
public readonly struct Tick
{
public string Symbol { get; }
public double Bid { get; }
public double Ask { get; }
public Tick(string symbol, double bid, double ask)
{
Symbol = symbol;
Bid = bid;
Ask = ask;
}
public override string ToString() => $"{Symbol}: {Bid:F5}/{Ask:F5}";
}
public class TickProcessor
{
private readonly ArrayPool<byte> _bufferPool = ArrayPool<byte>.Shared;
public void ProcessBatch(string[] rawTicks)
{
foreach (var tickStr in rawTicks)
{
// Rent a buffer (to avoid allocating new byte[] each time)
var buffer = _bufferPool.Rent(256);
try
{
int bytesWritten = Encoding.ASCII.GetBytes(tickStr, buffer);
var span = new ReadOnlySpan<byte>(buffer, 0, bytesWritten);
var tick = ParseTick(span);
OnTick(tick);
}
finally
{
_bufferPool.Return(buffer);
}
}
}
private static Tick ParseTick(ReadOnlySpan<byte> span)
{
// EURUSD,1.07432,1.07436
int firstComma = span.IndexOf((byte)',');
int secondComma = span.Slice(firstComma + 1).IndexOf((byte)',') + firstComma + 1;
string symbol = Encoding.ASCII.GetString(span[..firstComma]);
Utf8Parser.TryParse(span[(firstComma + 1)..secondComma], out double bid, out _);
Utf8Parser.TryParse(span[(secondComma + 1)..], out double ask, out _);
return new Tick(symbol, bid, ask);
}
private void OnTick(in Tick tick)
{
// Simulate publishing or processing the tick
Console.WriteLine(tick);
}
}
public static class Program
{
public static void Main()
{
var ticks = new[]
{
"EURUSD,1.07432,1.07436",
"GBPUSD,1.24587,1.24592",
"USDJPY,151.229,151.238",
};
var processor = new TickProcessor();
processor.ProcessBatch(ticks);
}
}
💡 Key improvements explained
| Improvement | Why it matters |
|---|---|
ArrayPool<byte>.Shared | Reuses buffers, avoids LOH churn |
ReadOnlySpan<byte> | Zero-copy slicing of incoming data |
Utf8Parser | Parses numeric values directly from bytes (no string allocations) |
readonly struct Tick | Stack-friendly immutable type, no GC tracking |
in Tick (if used) | Passes struct by ref → no copying |
🧩 Memory profile
- ✅ Only one small string allocation per tick (
Symbol) - ✅ No arrays or temporary strings per line
- ✅ All other memory reused via pool
- ✅ Negligible GC activity — steady-state latency
🧠 Discussion points for your interview
When asked “How do you ensure your system stays fast under high load?” — say:
“I design for allocation discipline — especially in tight loops. For example, in our tick processor, we rent buffers from
ArrayPool<T>, parse withSpan<byte>andUtf8Parserto avoid string and array allocations, and use small readonly structs for data. That keeps all transient data in Gen 0 and prevents Gen 2 pressure or LOH fragmentation. In load tests, we confirmed negligible GC activity and stable latency even at millions of ticks per second.”
✅ Pro tip
You can mention:
“In production, I monitor
dotnet-counters— ifGen 2 GC Countincreases, that’s a red flag that something’s allocating too much. Then I usedotnet-traceordotMemoryto find the source.”
Would you like me to extend this by showing the async version — i.e., reading ticks from a NetworkStream using System.IO.Pipelines (zero-copy streaming, ideal for high-throughput trading systems)? That’s exactly the kind of system might ask you to describe.
Questions & Answers
Split vs Span parsing?A: It shows the optimized implementation is faster and uses dramatically fewer allocations (tens of bytes vs kilobytes per tick). That reduction scales to gigabytes saved per second in production.
Utf8Parser preferred over double.Parse here?A: Utf8Parser operates directly on byte spans, avoiding string allocations and culture-dependent parsing. It’s ideal for fixed-format protocols and keeps parsing allocation-free.
ArrayPool<byte> help batch processing?A: Each tick lines uses the same reusable buffer instead of creating a new byte array. Returning the buffer keeps the LOH clean and ensures steady-state memory usage regardless of batch size.
Tick a readonly struct?A: It keeps the data inline, prevents accidental mutation, and avoids heap allocations when passing ticks around. Combined with in Tick parameters, we avoid copies even for frequent calls.
in Tick on the OnTick method?A: It passes the struct by readonly reference, eliminating defensive copies for large structs and preserving immutability guarantees without GC cost.
A: Use channels or System.Threading.Channels to fan out parsed ticks, but keep parsed structs allocation-free. Each consumer should reuse buffers or work with spans until serialization boundaries.
A: Run the benchmark with MemoryDiagnoser, inspect ETW events, or instrument code with GC.GetAllocatedBytesForCurrentThread() to ensure the optimized method stays within expected allocation budgets.
A: The pool will grow and eventually allocate new arrays, defeating the purpose and potentially causing memory leaks. Always return inside finally blocks to ensure deterministic cleanup.
A: Replace ASCII parsing with direct span slicing over binary fields, using BinaryPrimitives or custom parsing logic; the same pooling and span principles apply.
A: Emit structured logs with message templates, avoid string concatenation, and aggregate metrics using counters/gauges. When necessary, log summaries rather than per-tick details to keep the hot path clean.